Selecting The Most Highly Correlated Pairs Within A Large Vocabulary

نویسنده

  • Kyoji Umemura
چکیده

Occurence patterns of words in documents can be expressed as binary vectors. When two vectors are similar, the two words corresponding to the vectors may have some implicit relationship with each other. We call these two words a correlated pair. This report describes a method for obtaining the most highly correlated pairs of a given size. In practice, the method requires computation time, and memory space, where is the number of documents or records. Since this does not depend on the size of the vocabulary under analysis, it is possible to compute correlations between all the words in a corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature selection based on word–sentence relation1

Feature selection proved to improve both the speed and the quality of classification. Methods such as mutual information, information gain or chi-square are all based on the joint distribution of classes and words; there exist only a few methods which exploit contextual information for feature selection. We introduce an algorithm based on word and word pair frequencies that reduces both vocabul...

متن کامل

The Relationship between Depth and Breadth of Vocabulary Knowledge and Reading Comprehension among Iranian EFL Learners By:

The current study is an attempt to investigate the particular role learners' vocabulary knowledge plays in their reading comprehension performance. It intends to determine whether breadth and depth of vocabulary knowledge are related to EFL learners' reading comprehension, and to investigate which one of these variables, that is, depth or breadth of vocabulary knowledge, makes a more important ...

متن کامل

Design and Evaluation of a PCR Method for Detecting White Spot Syndrome Virus in Shrimp

Background: White spot syndrome virus (wssv) is the causing agent for white spot disease in shrimp and many crustaceans. This disease is highly contagious and can cause death within 3–10 days under normal culture conditions. Therefore, early diagnosis of the virus is a necessity. Materials and Methods: Primers were designed for three regions of the virus genome and one region of the shrimp geno...

متن کامل

Interactive Clustering Techniques for Selecting Speaker-Independent Reference Templates for Isolated Word Recognition

It is demonstrated that clustering can be a powerful tool for selecting reference templates for speaker-independent word recognition. We describe a set of clustering techniques specifically designed for this purpose. These interactive procedures identify coarse structure, fine structure, overlap of, and outliers from clusters. The techniques have been applied t a large speech data base consisti...

متن کامل

Word Type Effects on L2 Word Retrieval and Learning: Homonym versus Synonym Vocabulary Instruction

The purpose of this study was twofold: (a) to assess the retention of two word types (synonyms and homonyms) in the short term memory, and (b) to investigate the effect of these word types on word learning by asking learners to learn their Persian meanings. A total of 73 Iranian language learners studying English translation participated in the study. For the first purpose, 36 freshmen from an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002